feat(knowledge): introduce local Orama persistence (feature-flagged)#1015
Conversation
|
Thanks for narrowing this down from the earlier broader knowledge/RAG direction. This is a much better shape from a trust-boundary perspective because it is local-only and feature-flagged. Before a real review, though, this needs a refresh:
Could you rebase onto latest
Not blocking the direction conceptually, just asking for a clean reviewable head first. |
|
please rebase to main and fix conflicts |
bf8cb4f to
ad07527
Compare
|
Build Fix Summary: Module Resolution & Circular Dependencies The build failure in the CI (bun run smoke) was caused by a module resolution error following the architectural changes made to break a circular dependency.
The PR is now in a "Green" state and ready for review. |
|
hello @techbrewboss @jatmn kindly have a look too when you have time. |
Give me about an hour or so. Got you |
|
Review note from the current head:
Can we initialize Orama before the first knowledge graph write/search, and add a feature-flagged test that proves the |
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Thanks for the follow-up and rebase. I did a targeted current-head review of ad07527, focused on the feature-flagged Orama path, persistence behavior, and async knowledge/arc call sites.
Review scope: Targeted review of the new local Orama persistence path and its feature-flag behavior.
Verdict: Needs changes
Blocking issue:
- The Orama backend appears to never initialize.
src/utils/knowledgeGraph.tsexportsinitOrama(cwd), but on the current head it has no call site. The write/search branches are all guarded byisOramaEnabled() && oramaDb, so withOPENCLAUDE_KNOWLEDGE_ORAMA=1,oramaDbstill remainsnullunless someone manually callsinitOrama(). That meansaddGlobalEntity(),addGlobalSummary(),searchGlobalGraph(), andgetOrchestratedMemory()silently stay on the JSON path instead of exercising the new Orama persistence/search path.
What I checked:
- Current head:
ad07527 rg "initOrama\("only finds the exported function definition insrc/utils/knowledgeGraph.ts.- The Orama insert/search branches all require
oramaDbto already be non-null. - Existing tests cover the async JSON-path behavior, but I do not see a feature-flagged regression test proving
OPENCLAUDE_KNOWLEDGE_ORAMA=1creates/restores/searches the.oramapersistence file.
Please initialize Orama before the first feature-flagged write/search path, and add focused coverage that proves:
- default behavior remains JSON-only when the flag is unset
- with
OPENCLAUDE_KNOWLEDGE_ORAMA=1, the Orama store is initialized, persisted, restored, and actually used for search
Non-blocking note:
- I like that this version is local-only and feature-flagged. Once initialization and coverage are fixed, the remaining review can stay focused on persistence path safety and async compatibility.
ad07527 to
02e7bd9
Compare
|
Thank you for the detailed review! I have addressed the blocking issues regarding initialization and added the requested test coverage. Key Changes:
Verification:
|
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Scope: Targeted re-review of the current Orama head (02e7bd9) against the earlier initialization blocker and the new async/persistence paths.
Verdict: Needs changes
Good progress: the original blocker I saw earlier is mostly addressed now. initOrama() is no longer dead code; it is reached from the Orama-enabled insert/search paths, and the focused knowledge/conversation tests pass locally after installing the new locked dependencies.
Blocking issues:
finalizeArcTurn()did not finish the async conversion. It still returnsvoidand callsaddGlobalSummary(summaryContent, keywords)withoutawait, whilequery.tsstill callsfinalizeArcTurn()withoutawait. SinceaddGlobalSummary()now owns async Orama init/insert/save work, the turn summary can be fire-and-forgeted and race with the next loop/process shutdown. Please makefinalizeArcTurn()async, awaitaddGlobalSummary(), update thequery.tscall site, and make the test assert the awaited path intentionally./knowledge clear/resetGlobalGraph()only clears the JSON graph path. WithOPENCLAUDE_KNOWLEDGE_ORAMA=1, the newknowledge.oramapersistence file and in-memoryoramaDbcan survive the clear path, so cleared memory can still be restored/searched later. Please clear the Orama persistence file and in-memory DB as part of the reset/clear flow, and add a regression test with the Orama flag enabled.
Verification I ran:
bun install --frozen-lockfilebun test src/utils/knowledgeGraph.test.ts src/utils/conversationArc.test.ts src/commands/knowledge/knowledge.test.tspassed: 25/25
Happy to re-review once those two persistence/async semantics are tightened.
|
Changes summary:
|
|
@Vasanthdev2004 ready for review again :) |
gnanam1990
left a comment
There was a problem hiding this comment.
Re-reviewed the current head against the two blockers in Vasanthdev2004's last review. Both look addressed: (a) finalizeArcTurn() in src/utils/conversationArc.ts is now async and src/query.ts awaits it; addGlobalSummary() is awaited inside it. (b) resetGlobalGraph() in src/utils/knowledgeGraph.ts now rmSyncs getOramaPersistencePath(cwd) and nulls oramaDb when OPENCLAUDE_KNOWLEDGE_ORAMA=1, with a regression test in knowledgeGraph.test.ts.
Lazy await initOrama(getFsImplementation().cwd()) is invoked at every Orama write/search site, so the original initialization gap is closed. Flag default is correctly off (only === '1' enables).
Two non-blocking notes:
- Branch is CONFLICTING against main and needs rebase before merge.
initOramaswallowsrestore()failures withconsole.errorthen proceeds tocreate()— a corrupted persistence file silently resets project memory; consider renaming the bad file rather than overwriting on next save.
No red-flag rule hits. Verified locally: diffed knowledgeGraph.ts/conversationArc.ts/query.ts at HEAD, confirmed both prior reviewers' blockers addressed.
- Added @orama/orama and persistence plugin. - Implemented optional local-only Orama backend in knowledgeGraph.ts. - Gated Orama logic behind OPENCLAUDE_KNOWLEDGE_ORAMA=1. - Converted knowledge and conversation arc functions to async. - Fixed circular dependency between knowledgeGraph and sessionStorage by moving getProjectsDir to envUtils. - Updated all call sites and tests to handle async Knowledge API. - Verified build and tests pass on latest main.
6cdc841 to
a74c826
Compare
|
Hello @gnanam1990 thanks for the review. I have completed the requested corrections for the PR. Summary of Actions:
Ready again! |
…ArcTurn and Orama cleanup)
a74c826 to
cad6339
Compare
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Targeted re-review of the current head (cad6339).
Verdict: Approve-ready
What I checked:
- Rechecked the two blockers from my prior review.
finalizeArcTurn()is now async, awaitsaddGlobalSummary(), andsrc/query.tsawaitsfinalizeArcTurn()at turn finalization. - Rechecked the Orama reset path.
resetGlobalGraph()now removesknowledge.oramaand clears the in-memoryoramaDbwhenOPENCLAUDE_KNOWLEDGE_ORAMA=1, with regression coverage. - Rechecked the feature flag boundary. Orama remains opt-in only via
OPENCLAUDE_KNOWLEDGE_ORAMA === '1', and the default JSON path remains active when the flag is absent. - Rechecked production call sites for the now-async knowledge/arc APIs; the relevant runtime call sites are awaited.
Validation I ran locally:
bun test src/utils/knowledgeGraph.test.ts src/utils/conversationArc.test.tsbun run buildbun run smoke
Note: I also tried including src/commands/knowledge/knowledge.test.ts, but that hits the existing missing yolo-classifier-prompts/auto_mode_system_prompt.txt import path that is already present on main, so I am not treating that as a blocker for this PR. One small non-blocking cleanup: that command test still calls the now-async addEntity() without await, so it would be worth updating when that test file is cleaned up.
I do not see a remaining blocker on the current head.
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Targeted re-review of current head 7ffd64a0e6b8dbd58f4f989d47af18d0d207cd4a, focused on the new stress-test/corruption-recovery commit after my previous approval.
Verdict: Needs changes
Blocking issue:
- CI is red on the current head. The full suite fails in
conversationArc > Knowledge Graph > automatically learns facts from message contentwithDOCUMENT_ALREADY_EXISTSfrom Orama insert duringinitOrama().
The likely cause is the new src/utils/knowledgeGraph.stress.test.ts leaking OPENCLAUDE_KNOWLEDGE_ORAMA=1 into later tests. It saves/restores CLAUDE_CONFIG_DIR, but beforeEach() sets process.env.OPENCLAUDE_KNOWLEDGE_ORAMA = '1' and afterAll() never restores/deletes that env var. In CI order, the stress test runs before conversationArc.test.ts, so the existing JSON-path conversation-arc test unexpectedly runs with Orama enabled and hits duplicate Orama document insertion.
Please save the original OPENCLAUDE_KNOWLEDGE_ORAMA value and restore it in afterAll() (and keep clearing the in-memory graph/Orama state), then rerun the exact CI command: bun test --max-concurrency=1.
What I checked:
- The new commit since my prior approval only changes
src/utils/knowledgeGraph.tsand addssrc/utils/knowledgeGraph.stress.test.ts. bun test src/utils/conversationArc.test.ts --max-concurrency=1passes by itself.- GitHub CI fails with the Orama duplicate-document error after the stress test runs.
- Local
bun test --max-concurrency=1also fails on this branch, though my local env exposes additional profile-test pollution; CI's blocker is enough by itself.
Happy to re-review after the stress test restores the env/state cleanly and CI goes green.
|
Hello @Vasanthdev2004 I have fixed the CI failure by preventing environment variable pollution from the stress test. What was fixed:
Verification Results:
|
|
Architectural Improvements:
|
jatmn
left a comment
There was a problem hiding this comment.
Request For Fix
[P1] Serialize Orama document mutations, not just saves
File: src/utils/knowledgeGraph.ts
Lines: 231-246
The latest concurrency hardening adds an init lock and a save queue, but Orama mutations themselves can still run concurrently. Updating an existing entity performs a remove(oramaDb, id) followed by insert(oramaDb, ...) without any mutation lock. If two updates for the same entity overlap, both calls can remove first, then both try to insert the same id; one succeeds and the other throws DOCUMENT_ALREADY_EXISTS.
This can happen through concurrent fact extraction or any future parallel knowledge writes. It means a normal knowledge update path can reject even though the JSON graph was already mutated and saved, leaving the Orama path unreliable under the concurrency scenario this PR is trying to harden.
Local repro against this PR head:
bun --bun - <<'EOF'
import { mkdtempSync, rmSync } from 'fs'
import { tmpdir } from 'os'
import { join } from 'path'
process.env.CLAUDE_CONFIG_DIR = mkdtempSync(join(tmpdir(), 'orama-race-'))
process.env.OPENCLAUDE_KNOWLEDGE_ORAMA='1'
const kg = await import('./src/utils/knowledgeGraph.ts')
await kg.addGlobalEntity('tool','same',{base:'1'})
let errors=[]
await Promise.all(Array.from({length:100}, async (_,i)=>{
try { await kg.addGlobalEntity('tool','same',{['k'+i]:String(i)}) }
catch(e) { errors.push(String(e)) }
}))
console.log('errors', errors.length, errors.slice(0,3))
rmSync(process.env.CLAUDE_CONFIG_DIR,{recursive:true, force:true})
EOFObserved result:
errors 99 [ "Error: A document with id \"entity_...\" already exists.", ... ]
Please put the Orama remove/insert/save sequence behind a single serialized mutation queue or per-document lock, and add a regression test that performs concurrent updates to the same entity with OPENCLAUDE_KNOWLEDGE_ORAMA=1. The test should assert no thrown errors and that the restored Orama index contains the final merged attributes.
Verification Run
Passed:
bun install --frozen-lockfile
bun test src/utils/knowledgeGraph.test.ts src/utils/knowledgeGraph.stress.test.ts src/utils/conversationArc.test.ts --max-concurrency=1
bun run buildAlso ran:
bun test --max-concurrency=1This failed with 12 failures in existing profile/SDK consumer type tests. I did not see those failures tied to the knowledge/Orama changes during this review; the focused PR tests and build passed.
…ized I/O, and consolidated state
5703b6e to
b221ee6
Compare
|
Hello @jatmn
|
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Targeted re-review of current head b221ee6, focused on the Orama persistence fixes and the previously discussed initialization/cleanup/concurrency blockers.
Verdict: Approve-ready from my side
What I checked:
- Orama remains feature-flagged behind
OPENCLAUDE_KNOWLEDGE_ORAMA === '1'; the JSON path remains the default when the flag is unset. initOrama()is now reached from the write/search paths instead of being dead code.- Corrupted Orama persistence is renamed before a fresh DB is created, rather than silently overwriting the bad file.
- Mutations are serialized through
mutationQueue, including remove/insert/save, which addresses the same-entity update race raised in review. resetGlobalGraph()clears both JSON and Orama persistence so stale.oramastate does not survive disabled/re-enabled flag cycles.finalizeArcTurn()is now async and awaited from the query loop before the next turn state is built.- New Orama dependencies are declared and kept external in the build externals list.
Validation:
bun test ./src/utils/knowledgeGraph.test.ts ./src/utils/knowledgeGraph.stress.test.ts ./src/utils/conversationArc.test.ts --max-concurrency=1- Result: 28/28 passing.
I did not do a full clean-slate review of every knowledge/RAG behavior in this large PR, but I do not see a remaining blocker in the areas that were previously holding this up.
kevincodex1
left a comment
There was a problem hiding this comment.
This looks good to me now.
|
maybe we just make orama as default and then sunset JSON file-store can you have a research on this bro @3kin0x what will be the possible impact? |
…itlawb#1015) * feat(knowledge): introduce local Orama persistence (clean phase 1) - Added @orama/orama and persistence plugin. - Implemented optional local-only Orama backend in knowledgeGraph.ts. - Gated Orama logic behind OPENCLAUDE_KNOWLEDGE_ORAMA=1. - Converted knowledge and conversation arc functions to async. - Fixed circular dependency between knowledgeGraph and sessionStorage by moving getProjectsDir to envUtils. - Updated all call sites and tests to handle async Knowledge API. - Verified build and tests pass on latest main. * fix: address PR review comments for knowledge feature (async finalizeArcTurn and Orama cleanup) * test: add comprehensive stress and edge case testing for Orama Knowledge Graph * fix: prevent test pollution by restoring Orama env flag in stress test * refactor: harden Knowledge architecture with concurrency locks, optimized I/O, and consolidated state
Summary
Impact
Testing
Notes
gating).
@Vasanthdev2004
@gnanam1990
@kevincodex1